Low-Overhead, High-Speed Multi-core Barrier Synchronization
نویسندگان
چکیده
Abstract. Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardwarebased techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.
منابع مشابه
Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips
Barrier synchronization is commonly and widely used to synchronize the execution of parallel processor cores on multi-core Network-on-Chips (NoCs). Since its global nature may cause heavy serialization resulting in large performance penalty, barrier synchronization should be carefully designed to have low latency communication and to minimize overall completion time. Therefore, in the paper, we...
متن کاملFastForward for Concurrent Threaded Pipelines
The performance, cost, and flexibility of commodity multi-core systems make them appealing for threaded applications. Unfortunately, popular threading techniques require independent code regions, use expensive synchronization primitives, and use expensive communication mechanisms. Recently, researchers have proposed several Concurrent Threaded Pipeline architectures (CTP) which relax the data i...
متن کاملFastForward for Concurrent Threaded Pipelines ; CU-CS-1023-07
The performance, cost, and flexibility of commodity multi-core systems make them appealing for threaded applications. Unfortunately, popular threading techniques require independent code regions, use expensive synchronization primitives, and use expensive communication mechanisms. Recently, researchers have proposed several Concurrent Threaded Pipeline architectures (CTP) which relax the data i...
متن کاملA Novel Synchronization Technique for Fast and Accurate Multi-core Instruction-set Simulation
This paper proposes a synchronization technique for fast and accurate Multi-Core Instruction-Set Simulation (MCISS). Traditionally, a lock-step approach, which synchronizes every cycle, is commonly used to achieve accurate simulation results of MCISS. However, this approach results in immense overhead and low simulation speed. Rather than synchronizing every cycle, our approach synchronizes the...
متن کاملAn Efficient Architectural Design of Hardware Interface for Heterogeneous Multi-core System
How to manage the message passing among inter processor cores with lower overhead is a great challenge when the multi-core system is the contemporary solution to satisfy high performance and low energy demands in general and embedded computing domains. Generally speaking, the networks-on-chip connects the distributed multi-core system. It takes charge of message passing which including data and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010